### Design choices for `waldCI`
In my implementation of the `waldCI` class, I made the following design choices:
1. **What to store in the class**
I store:
- `mean` (slot type: `numeric`, length 1),
- `sterr` (slot type: `numeric`, length 1),
- `level` (slot type: `numeric`, length 1),
- `digits` (slot type: `integer`, length 1).
The idea is that a Wald confidence interval is fundamentally determined by an estimate and its standard error. The lower and upper bounds for any confidence level can be derived from these via the normal quantile. Therefore, I chose to store `mean` and `sterr` as the “primary” parameters, and treat the interval endpoints as derived quantities.
I also keep a default confidence level `level` in the object, which is used when methods (such as `show`, `lb`, `ub`, `as.numeric`, `contains`, `overlap`) are called without an explicit level argument. The `digits` slot is used to control the number of decimal places printed in the `show` method.
2. **How they are stored (object types)**
All three numerical parameters (`mean`, `sterr`, `level`) are stored as `numeric` scalars. The `digits` slot is stored as an `integer` scalar.
The validity function enforces:
- `sterr` > 0 and finite,
- `level` is finite and lies strictly between 0 and 1,
- `mean` is finite,
- and the implied confidence interval bounds for the default `level` are finite and satisfy `lb < ub`.
3. **Enforcing monotonicity in `transformCI`**
The `transformCI` function takes a *monotone* function `f` and a `waldCI` object. Conceptually, if `f` is monotone on the interval \([L, U]\), then the transformed confidence interval is given by \([f(L), f(U)]\) up to ordering. In the implementation, I:
- Compute the original bounds \([L, U]\) for a given level.
- Apply `f` to the endpoints and take `min(f(L), f(U))` and `max(f(L), f(U))` as the new bounds.
- Optionally (when `check.monotone = TRUE`), I sample a few grid points between `L` and `U`, apply `f` to these points, and check whether all successive differences are non-negative or non-positive. If this simple check fails, I issue a warning that the function may not be monotone on the interval.
This does not guarantee strict mathematical monotonicity, but it is a practical heuristic. I also assume that the user provides a function that is intended to be monotonic; the check is mainly used to catch obviously non-monotone functions.
Overall, this design keeps the internal representation simple (estimate + standard error), while allowing all requested methods to be implemented in a straightforward way.hw5
Information
the Github website for this homework is https://github.com/jiayihu-Jeffery/stats506_hw5
Problem Set #05
Problem 1 - OOP Programming
Create a class to represent Wald-style normal approximation Confidence Intervals. Do this using S4.
For the
waldCIclass, define the following:- A constructor, which takes in either a mean and standard deviation, or a lower and upper bound along with a confidence level. This should be a custom constructor, not
new()orwaldCI(). - A validator.
- A
showmethod, which takes as argumentslevelanddigits. - Accessors:
lb,ub,mean,sterr.lbandubshould take inlevel. - Setters:
lb,ub,mean,sterr. Be sure to validate the resultingwaldCI.lbandubshould take in alevel. - A
containsmethod, returning a logical of whether a value is within a CI of a certainlevel. - An
overlapmethod, that takes in twowaldCI’s and alevel, and returns a logical of whether the two confidence intervals overlap. as.numericto returnc(lb, ub)for a givenlevel.transformwhich takes in a monotonicfunctionand awaldCI, and returns the transformed confidence interval.
- A constructor, which takes in either a mean and standard deviation, or a lower and upper bound along with a confidence level. This should be a custom constructor, not
.z_from_level <- function(level) {
if (length(level) != 1L || !is.numeric(level)) {
stop("'level' must be a numeric scalar.", call. = FALSE)
}
qnorm(1 - (1 - level) / 2)
}
.lb_from_params <- function(mean, sterr, level) {
mean - .z_from_level(level) * sterr
}
.ub_from_params <- function(mean, sterr, level) {
mean + .z_from_level(level) * sterr
}
setClass(
"waldCI",
slots = list(
mean = "numeric",
sterr = "numeric",
level = "numeric",
digits = "integer"
),
prototype = list(
mean = NA_real_,
sterr = NA_real_,
level = 0.95,
digits = 3L
),
validity = function(object) {
msgs <- character()
if (length(object@mean) != 1L || !is.numeric(object@mean)) {
msgs <- c(msgs, "'mean' must be a numeric scalar.")
}
if (length(object@sterr) != 1L || !is.numeric(object@sterr)) {
msgs <- c(msgs, "'sterr' must be a numeric scalar.")
}
if (length(object@level) != 1L || !is.numeric(object@level)) {
msgs <- c(msgs, "'level' must be a numeric scalar.")
}
if (length(object@digits) != 1L || !is.integer(object@digits)) {
msgs <- c(msgs, "'digits' must be an integer scalar.")
}
if (!is.na(object@sterr)) {
if (!(object@sterr > 0)) {
msgs <- c(msgs, "'sterr' must be strictly positive.")
}
if (!is.finite(object@sterr)) {
msgs <- c(msgs, "'sterr' must be finite.")
}
}
if (!is.na(object@mean)) {
if (!is.finite(object@mean)) {
msgs <- c(msgs, "'mean' must be finite.")
}
}
if (!is.na(object@level)) {
if (!(object@level > 0 && object@level < 1)) {
msgs <- c(msgs, "'level' must lie strictly between 0 and 1.")
}
if (!is.finite(object@level)) {
msgs <- c(msgs, "'level' must be finite.")
}
}
if (!is.na(object@digits)) {
if (object@digits < 0L) {
msgs <- c(msgs, "'digits' must be >= 0.")
}
}
if (length(msgs) == 0L &&
is.finite(object@mean) &&
is.finite(object@sterr) &&
is.finite(object@level) &&
object@sterr > 0) {
L <- .lb_from_params(object@mean, object@sterr, object@level)
U <- .ub_from_params(object@mean, object@sterr, object@level)
if (!is.finite(L) || !is.finite(U)) {
msgs <- c(msgs, "CI bounds must be finite.")
} else if (!(L < U)) {
msgs <- c(msgs, "Lower bound must be strictly less than upper bound.")
}
}
if (length(msgs) == 0L) TRUE else msgs
}
)
WaldCI <- function(mean = NULL, sterr = NULL,
lb = NULL, ub = NULL,
level = 0.95,
digits = 3L) {
has_mean <- !is.null(mean)
has_sterr <- !is.null(sterr)
has_lb <- !is.null(lb)
has_ub <- !is.null(ub)
if (has_mean && has_sterr && !has_lb && !has_ub) {
if (length(mean) != 1L || length(sterr) != 1L) {
stop("'mean' and 'sterr' must be scalars.", call. = FALSE)
}
mean <- as.numeric(mean)
sterr <- as.numeric(sterr)
} else if (!has_mean && !has_sterr && has_lb && has_ub) {
if (length(lb) != 1L || length(ub) != 1L) {
stop("'lb' and 'ub' must be scalars.", call. = FALSE)
}
lb <- as.numeric(lb)
ub <- as.numeric(ub)
if (!(lb < ub)) {
stop("'lb' must be strictly less than 'ub'.", call. = FALSE)
}
z <- .z_from_level(level)
mean <- (lb + ub) / 2
sterr <- (ub - lb) / (2 * z)
} else {
stop("Specify either (mean & sterr) or (lb & ub), but not both.",
call. = FALSE)
}
level <- as.numeric(level)
digits <- as.integer(digits)
obj <- new("waldCI",
mean = mean,
sterr = sterr,
level = level,
digits = digits)
validObject(obj)
obj
}
if (!isGeneric("lb")) {
setGeneric("lb", function(object, level) standardGeneric("lb"))
}[1] "lb"
if (!isGeneric("ub")) {
setGeneric("ub", function(object, level) standardGeneric("ub"))
}[1] "ub"
if (!isGeneric("sterr")) {
setGeneric("sterr", function(object) standardGeneric("sterr"))
}[1] "sterr"
if (!isGeneric("level")) {
setGeneric("level", function(object) standardGeneric("level"))
}[1] "level"
if (!isGeneric("lb<-")) {
setGeneric("lb<-", function(object, level, value) standardGeneric("lb<-"))
}[1] "lb<-"
if (!isGeneric("ub<-")) {
setGeneric("ub<-", function(object, level, value) standardGeneric("ub<-"))
}[1] "ub<-"
if (!isGeneric("sterr<-")) {
setGeneric("sterr<-", function(object, value) standardGeneric("sterr<-"))
}[1] "sterr<-"
if (!isGeneric("level<-")) {
setGeneric("level<-", function(object, value) standardGeneric("level<-"))
}[1] "level<-"
if (!isGeneric("mean<-")) {
setGeneric("mean<-", function(x, value) standardGeneric("mean<-"))
}[1] "mean<-"
if (!isGeneric("contains")) {
setGeneric("contains", function(object, x, level) standardGeneric("contains"))
}[1] "contains"
if (!isGeneric("overlap")) {
setGeneric("overlap", function(x, y, level) standardGeneric("overlap"))
}[1] "overlap"
setMethod("lb",
signature(object = "waldCI", level = "missing"),
function(object) {
.lb_from_params(object@mean, object@sterr, object@level)
})
setMethod("lb",
signature(object = "waldCI", level = "numeric"),
function(object, level) {
.lb_from_params(object@mean, object@sterr, level)
})
setMethod("ub",
signature(object = "waldCI", level = "missing"),
function(object) {
.ub_from_params(object@mean, object@sterr, object@level)
})
setMethod("ub",
signature(object = "waldCI", level = "numeric"),
function(object, level) {
.ub_from_params(object@mean, object@sterr, level)
})
setMethod("mean",
signature(x = "waldCI"),
function(x, ...) {
x@mean
})
setMethod("sterr",
signature(object = "waldCI"),
function(object) {
object@sterr
})
setMethod("level",
signature(object = "waldCI"),
function(object) {
object@level
})
setMethod("lb<-",
signature(object = "waldCI", level = "missing", value = "numeric"),
function(object, value) {
lev <- object@level
old_lb <- lb(object, lev)
delta <- value - old_lb
object@mean <- object@mean + delta
validObject(object)
object
})
setMethod("lb<-",
signature(object = "waldCI", level = "numeric", value = "numeric"),
function(object, level, value) {
old_lb <- lb(object, level)
delta <- value - old_lb
object@mean <- object@mean + delta
validObject(object)
object
})
setMethod("ub<-",
signature(object = "waldCI", level = "missing", value = "numeric"),
function(object, value) {
lev <- object@level
old_ub <- ub(object, lev)
delta <- value - old_ub
object@mean <- object@mean + delta
validObject(object)
object
})
setMethod("ub<-",
signature(object = "waldCI", level = "numeric", value = "numeric"),
function(object, level, value) {
old_ub <- ub(object, level)
delta <- value - old_ub
object@mean <- object@mean + delta
validObject(object)
object
})
setMethod("mean<-",
signature(x = "waldCI", value = "numeric"),
function(x, value) {
x@mean <- as.numeric(value)
validObject(x)
x
})
setMethod("sterr<-",
signature(object = "waldCI", value = "numeric"),
function(object, value) {
object@sterr <- as.numeric(value)
validObject(object)
object
})
setMethod("level<-",
signature(object = "waldCI", value = "numeric"),
function(object, value) {
object@level <- as.numeric(value)
validObject(object)
object
})
setMethod("show",
signature(object = "waldCI"),
function(object) {
lev <- object@level
dig <- object@digits
L <- lb(object, lev)
U <- ub(object, lev)
cat("An object of class 'waldCI'\n")
cat(sprintf(" mean = %.*f\n", dig, object@mean))
cat(sprintf(" sterr = %.*f\n", dig, object@sterr))
cat(sprintf(" level = %.*f\n", dig, lev))
cat(sprintf(" CI = [%.*f, %.*f]\n", dig, L, dig, U))
})
setMethod("contains",
signature(object = "waldCI", x = "numeric", level = "missing"),
function(object, x) {
lev <- object@level
L <- lb(object, lev)
U <- ub(object, lev)
x >= L & x <= U
})
setMethod("contains",
signature(object = "waldCI", x = "numeric", level = "numeric"),
function(object, x, level) {
L <- lb(object, level)
U <- ub(object, level)
x >= L & x <= U
})
setMethod("overlap",
signature(x = "waldCI", y = "waldCI", level = "missing"),
function(x, y) {
lev <- x@level
L1 <- lb(x, lev); U1 <- ub(x, lev)
L2 <- lb(y, lev); U2 <- ub(y, lev)
max(L1, L2) <= min(U1, U2)
})
setMethod("overlap",
signature(x = "waldCI", y = "waldCI", level = "numeric"),
function(x, y, level) {
L1 <- lb(x, level); U1 <- ub(x, level)
L2 <- lb(y, level); U2 <- ub(y, level)
max(L1, L2) <= min(U1, U2)
})
setMethod("as.numeric",
signature(x = "waldCI"),
function(x, level = x@level, ...) {
L <- lb(x, level)
U <- ub(x, level)
c(L, U)
})
transformCI <- function(ci, f, level = ci@level,
check.monotone = TRUE) {
if (!inherits(ci, "waldCI")) {
stop("'ci' must be a 'waldCI' object.", call. = FALSE)
}
if (!is.function(f)) {
stop("'f' must be a function.", call. = FALSE)
}
L <- lb(ci, level)
U <- ub(ci, level)
a <- f(L)
b <- f(U)
if (!is.finite(a) || !is.finite(b)) {
stop("Transformed bounds must be finite.", call. = FALSE)
}
if (check.monotone) {
xs <- seq(L, U, length.out = 5L)
ys <- vapply(xs, f, numeric(1))
if (any(!is.finite(ys))) {
warning("Transformed values contain non-finite numbers; monotonicity check skipped.")
} else {
d <- diff(ys)
if (!(all(d >= 0) || all(d <= 0))) {
warning("Function may not be monotone on this interval.")
}
}
}
new_lb <- min(a, b)
new_ub <- max(a, b)
WaldCI(lb = new_lb, ub = new_ub,
level = level,
digits = ci@digits)
}- Use your
waldCIclass to create three objects:
Evaluate the following code:
ci1: (17.2, 24.7), 95%
ci2: mean: 13, standard error: 2.5, 99%
ci3: (27.43, 39.22), 75%
ci1
ci2
ci3
as.numeric(ci1)
as.numeric(ci2)
as.numeric(ci3)
lb(ci2)
ub(ci2)
mean(ci1)
sterr(ci3)
level(ci2)
lb(ci2) <- 10.5
mean(ci3) <- 34
level(ci3) <- .8
contains(ci1, 17)
contains(ci3, 44)
overlap(ci1, ci2)
eci1 <- transformCI(ci1, sqrt)
eci1
mean(transformCI(ci2, log))
ci1 <- WaldCI(lb = 17.2, ub = 24.7, level = 0.95)
ci2 <- WaldCI(mean = 13, sterr = 2.5, level = 0.99)
ci3 <- WaldCI(lb = 27.43, ub = 39.22, level = 0.75)
ci1An object of class 'waldCI'
mean = 20.950
sterr = 1.913
level = 0.950
CI = [17.200, 24.700]
ci2An object of class 'waldCI'
mean = 13.000
sterr = 2.500
level = 0.990
CI = [6.560, 19.440]
ci3An object of class 'waldCI'
mean = 33.325
sterr = 5.125
level = 0.750
CI = [27.430, 39.220]
as.numeric(ci1)[1] 17.2 24.7
as.numeric(ci2)[1] 6.560427 19.439573
as.numeric(ci3)[1] 27.43 39.22
lb(ci2)[1] 6.560427
ub(ci2)[1] 19.43957
mean(ci1)[1] 20.95
sterr(ci3)[1] 5.12453
level(ci2)[1] 0.99
lb(ci2) <- 10.5
mean(ci3) <- 34
level(ci3) <- 0.8
contains(ci1, 17)[1] FALSE
contains(ci3, 44)[1] FALSE
overlap(ci1, ci2)[1] TRUE
eci1 <- transformCI(ci1, sqrt)
eci1An object of class 'waldCI'
mean = 4.559
sterr = 0.210
level = 0.950
CI = [4.147, 4.970]
mean(transformCI(ci2, log))[1] 2.75161
- Show that your validator does not allow the creation of invalid confidence intervals:
negative standard error
lb > ub
Infinite bounds
invalid use of the setters
(Infinite confidence bounds are of course not truely invalid, but we’re just going to ignore them for this case.)
Note that there are a lot of choices to be made here. What are you going to store in the class? How are you going to store them (what object types)? How are you going to enforce the function in transform being monotonic?
There is no right answer to those questions. Make the best decision you can, and don’t be afraid to change it if your decision causes unforeseen difficulties.
You may not use any existing R functions or packages that would trivialize this assignment. (E.g. if you found an existing package that does this, or found a function that checks for overlap between two CIs, that is not able to be used.)
Hint: It may be useful to define other functions that I don’t explicitly ask for.
cat("negative sterr via constructor\n")negative sterr via constructor
try(WaldCI(mean = 0, sterr = -1, level = 0.95))Error in validObject(.Object) :
invalid class "waldCI" object: 'sterr' must be strictly positive.
cat("\nlb > ub via constructor\n")
lb > ub via constructor
try(WaldCI(lb = 5, ub = 1, level = 0.95))Error : 'lb' must be strictly less than 'ub'.
cat("\ninfinite bounds via lb = -Inf\n")
infinite bounds via lb = -Inf
try(WaldCI(lb = -Inf, ub = 1, level = 0.95))Error in validObject(.Object) :
invalid class "waldCI" object: 1: 'sterr' must be finite.
invalid class "waldCI" object: 2: 'mean' must be finite.
cat("\ninfinite bounds via mean = Inf\n")
infinite bounds via mean = Inf
try(WaldCI(mean = Inf, sterr = 1, level = 0.95))Error in validObject(.Object) :
invalid class "waldCI" object: 'mean' must be finite.
cat("\ninvalid setter: negative sterr\n")
invalid setter: negative sterr
ci_ok1 <- WaldCI(mean = 0, sterr = 1, level = 0.95)
try(sterr(ci_ok1) <- -2)Error in validObject(object) :
invalid class "waldCI" object: 'sterr' must be strictly positive.
cat("\ninvalid setter: infinite sterr\n")
invalid setter: infinite sterr
ci_ok2 <- WaldCI(mean = 1, sterr = 0.5, level = 0.9)
try(sterr(ci_ok2) <- Inf)Error in validObject(object) :
invalid class "waldCI" object: 'sterr' must be finite.
cat("\ninvalid setter: level outside (0,1)\n")
invalid setter: level outside (0,1)
ci_ok3 <- WaldCI(mean = 2, sterr = 1, level = 0.95)
try(level(ci_ok3) <- 1)Error in validObject(object) :
invalid class "waldCI" object: 'level' must lie strictly between 0 and 1.
cat("\ninvalid setter: making bounds infinite via mean\n")
invalid setter: making bounds infinite via mean
ci_ok4 <- WaldCI(mean = 3, sterr = 1, level = 0.95)
try(mean(ci_ok4) <- Inf)Error in validObject(x) :
invalid class "waldCI" object: 'mean' must be finite.
Problem 3 - plotly
Repeat problem set 4, question 3 using plotly.
There is no expectation that you produce the exact same plots as last time. You may of course use your plots as last time, or the ones from the problem set 4 solutions, as inspiration for these plots.
These will be graded similar to last time:
- Is the type of graph & choice of variables appropriate to answer the question?
- Is the graph clear and easy to interpret?
- Is the graph publication ready?
library(readr)
library(dplyr)
library(plotly)
covid_states <- read_csv(
"https://github.com/nytimes/covid-19-data/raw/master/rolling-averages/us-states.csv",
col_types = cols()
)
covid_states <- covid_states %>%
mutate(date = as.Date(date))
national_daily <- covid_states %>%
group_by(date) %>%
summarize(cases_avg = sum(cases_avg, na.rm = TRUE), .groups = "drop")
p1 <- plot_ly(
national_daily,
x = ~date,
y = ~cases_avg,
type = "scatter",
mode = "lines"
) %>%
layout(
title = "Daily 7-day average COVID-19 cases in the US (sum over states)",
xaxis = list(title = "Date"),
yaxis = list(title = "7-day average cases (sum over states)")
)
p1state_rates <- covid_states %>%
group_by(state) %>%
summarize(mean_rate = mean(cases_avg_per_100k, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(mean_rate))
top_states <- head(state_rates$state, 3)
bottom_states <- tail(state_rates$state, 3)
states_subset <- covid_states %>%
filter(state %in% c(top_states, bottom_states))
p2 <- plot_ly(
states_subset,
x = ~date,
y = ~cases_avg_per_100k,
color = ~state,
type = "scatter",
mode = "lines"
) %>%
layout(
title = "States with highest and lowest mean case rates per 100k",
xaxis = list(title = "Date"),
yaxis = list(title = "7-day average cases per 100,000"),
legend = list(title = list(text = "State"))
)
p2first_substantial <- covid_states %>%
filter(cases_avg_per_100k >= 1) %>%
group_by(state) %>%
summarize(first_date = min(date), .groups = "drop") %>%
arrange(first_date) %>%
slice(1:5)
first_substantial# A tibble: 5 × 2
state first_date
<chr> <date>
1 Washington 2020-03-15
2 New York 2020-03-18
3 Guam 2020-03-19
4 Louisiana 2020-03-19
5 New Jersey 2020-03-19
early_states_data <- covid_states %>%
filter(state %in% first_substantial$state)
p3 <- plot_ly(
early_states_data,
x = ~date,
y = ~cases_avg_per_100k,
color = ~state,
type = "scatter",
mode = "lines"
) %>%
layout(
title = "Early states to experience substantial COVID activity",
xaxis = list(title = "Date"),
yaxis = list(title = "7-day average cases per 100,000"),
legend = list(title = list(text = "State"))
)
p3